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1.  INTRODUCTION 


Monte  Carlo  studies  are  experiments  that  use  simulated  data.  Like 
any  experiment ,  they  should  be  designed  to  minimize  extraneous  varia¬ 
tion.  However  the  Monte  Carlo  experimenter,  unlike  the  designer  of 
traditional  field  experiments,  usually  knows  and  can  control  the 
stochastic  structure  of  the  simulated  data.  "Swindles"  or  variance 
reduction  techniques  exploit  this  knowledge  to  construct  more  precise 
estimates  of  the  unknown  parameters  or  to  reduce  the  number  of  simula¬ 
tion  runs  (and  thus  the  cost)  necessary  to  attain  some  desired  level  of 
precision.  Because  swindle  designs  use  information  that  is  not  usually 
available  in  field  experiments,  they  sometimes  appear  to  provide  an 
unfair  reduction  in  the  variance  of  estimated  quantities  (hence  their 
name) .  Indeed,  the  improvement  in  precision  and  the  attendant  reduction 
in  cost  can  be  so  great  that  a  well-designed  swindle  can  make  feasible 
Monte  Carlo  studies  that  might  otherwise  have  been  impossible.  In 
view  of  this,  it  is  unfortunate  that  many  Monte  Carlo  studies  do  not 
employ  variance  reduction  methods.  This  may  be  due  in  part  to  the 
relatively  restricted  applicability  of  standard  swindle  methods  or  to 
a  lack  of  awareness  of  the  methods. 

Perhaps  the  most  common  application  of  Monte  Carlo  swindles  in 
statistics  has  been  in  estimating  variances  or  mean  squared  errors.  A 
simulation  study  of  variances  often  form s  the  basis  fbr  comparing  the 
small-sample  efficiencies  of  a  collection  of,  say,  robust  or  resistant 
estimators.  A  typical  and  traditional  question  might  be  "How  much  more 
efficient  is  a  10%  trimned  mean  than  the  sample  mean  in  samples  of  20 


from  a  heavy- tailed  distribution?"  The  Princeton  Robustness  Study 
(Andrews  et  al . .  1972)  provides  s  large  and  well-known  example.  On  a 
smaller  scale,  such  studies  are  now  a  roqtine  part  of  the  scrutiny  of 
statistical  procedures. 

Our  principal  purpose  in  this  paper  is  to  unify  and  extend  the  treat* 
sent  of  swindles  for  estimating  variances  in  the  hope  that  they  nay  then 
be  applied  more  easily  and  widely.  To  do  this,  we  (1)  propose  a  new 
swindle  (based  on  Fisher's  efficient  score  function)  that  is  sinpler, 

■ore  effective,  and  more  general  than  the  current  "Gaussian- 
over-independent"  (G/I)  method  commonly  used  in  variance  estimation, 

(2)  provide  examples  of  how  this  swindle  can  be  applied,  and  (3)  describe 
how  this  and  other  swindles  for  variances  fit  into  a  familiar  geometric 
framework  (which  in  turn  suggests  further  applications) . 

Section  3  presents  the  score  function  swindle  first  in  the  simplest 
case:  for  location  equivariant  estimators  in  the  location  problem  with 
known  scale.  Section  4  presents  a  simple  application  to  the  problem  of 
estimating  the  variance  of  the  Pitman  estimator  of  location  in  small 
samples  drawn  from  Student's  t  distributions  (and  reports  new  results 
for  this  problem) .  In  Section  5  we  compare  the  score  function  swindle 
numerically  with  the  standard  Gaussian-over-independent  swindle  used  in 
the  Princeton  study.  We  examine  the  relative  swindle  gains  for  the  two 
methods  for  a  variety  of  location  estimators  and  distributions  in  the 
t-family,  and  find  that  in  most  cases  the  score  function  technique 
dominates.  To  facilitate  the  comparison.  Section  2  presents  some 
general  issues  in  assessing  swindle  gains,  then  outlines  and  discusses 
the  Gaussian-over-independent  swindle. 
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Haaaersley  and  Handscomb  (1964)  and  Rubinstein  (1981)  discuss 
general  principles  of  variance  reduction  methods.  Simon  (1976)  surveys 
applications  of  swindles  to  simulation  studies  in  statistical  research. 

The  common  swindles  for  estimating  the  variance  of  a  statistic 
T(Y)  exploit  a  simple  variance  decomposition 


(1.1) 


Var  T  ■  Var  S  *  var(T  -  S)  , 


in  which  S  and  T  -  S  are  uncorrelated  and  Var  S  is  either  known  from 

the  distribution  of  Y  or  can  be  easily  approximated.  Ideally  S 

should  be  highly  correlated  with  T,  for  then  Var(T  -  S)  will  be  small 

and  hence,  in  general,  more  precisely  estimable.  A  useful  way  to  obtain  such 

deconpositions  is  to  identify  an  affine  subset  Z  of  statistics  with 

finite  variance  to  which  T  belongs  and  take  S  as  a  minimum  variance 

element  of  Z  .  This  is  discussed  further  in  Section  6 ,  where  it  is 

shown  how  both  the  s core- function  and  G/I  swindles  and  a  number  of 

other  swindles  discussed  in  the  literature  may  be  obtained  by  varying 

the  choices  of  Z  .  Further  applications  of  swindles  based  on  decomposition 

(1.1)  to  statistical  decision  theory  (frequentist  and  Bayesian),  and  of 

the  score  function  swindle  in  particular  to  multivariate,  discrete  and  _ 

bootstrap  location  problems  are  outlined  in  Section  7. 
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The  'regression  estimate*  of  sampling  theory  (Cochran  1977, 

ch.  7)  suggests  a  variance  reduction  method  which  in  soae  senses  is 

simpler  and  aore  widely  applicable  than  the  above  approach.  Let  9* 

be  unbiased  for  Var  T.  Suppose  there  is  available  another  statistic 

S  ,  preferably  highly  correlated  with  T  ,  for  which  Var  S  is 

* 

known  and  possesses  an  unbiased  estimate  3g.  Then 
(1.2)  a*  •  9j  *  b(o|  -  Og) 

is  an  unbiased  estimate  of  for  all  b,  and  has  smaller  variance  than 
for  soae  interval  of  b  values  about  the  optimum  b*  ■  -  Cov(cJ,  9*)/ 
Var(Sg)  .  However,  the  optimum  b*  will  rarely  be  known,  and  will  thus  need 
to  be  estimated,  introducing  a  bias  to  .  On  the  other  hand,  it  is 
not  necessary  to  have  S  and  T - S  uncorrelated  for  the  method  to  be 
applicable  as  was  needed  for  (1.1).  A  normal  theory  calculation  in 
Remark  8A  suggests  that  the  decomposition  (1.1),  when  available,  leads 
to  larger  swindle  savings  than  (1.2). 

This  paper  focuses  on  methods  for  increasing  the  precision  of 
variance  estimates.  Often  comparisons  of  variance  estimates  in  the 
form  Var  Tj/Var  or  Var  -  Var  T^  are  sought  and  assessments  of 
swindle  gains  will  of  course  differ  in  these  cases.  Without  attempting 
a  systematic  discussion,  we  give  in  Remark  8B  an  assessment  of  the 
swindle  gains  from  the  variance  decomposition  (1.1)  in  the  ratio  case. 

The  crude  comparison  of  (1.1)  and  the  'regression  estimate'  (1.2) 
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(Remark  8A)  can  be  extended  to  the  case  of  differences  Var  -  Var  Tj, 
with  nonal- theory  calculations  suggesting  that  (1.1)  dominates  when  the 
correlation  between  S  and  Tj ,  and  between  S  and  T2  is  fairly 
strong  and  is  of  the  saae  order  or  stronger  than  the  correlation  between 
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2.  BACKGROUND 


2.1  Measuring  Swindle  Gains 

Suppose  that  a  decomposition  (1.1)  holds  and  that  Var  S  is 
known.  Ne  measure  the  gain  in  precision  (or  equivalently  the  reduction 
in  number  of  experiment  replications  needed)  by  comparing  Var  Var  T 
to  VarVar(T-S)  . 

More  specifically,  assume  that  ET(Y)  ■  ES(Y)  *0  and  that  naive 
method-of -moments  estimators  are  used  for  Var  T  and  Var(T  -  S)  .  For 
example,  Var  T  can  be  estimated  by  i  £  T2 (YJ)  ,  where  J  indexes 

N  j«i 

replications  of  the  sample  Y  ■  (Y^,...,Yn)  .  Then,  of  course 
Var  Var  T  *  ^  Var  T^Y)  ,  and  from  the  identity 

Var  T2  •  (ET2)2(ET4/(ET2)2  -  1]  , 


we  obtain 


Var  Var  T 

i  2  r  <(d  - 1 1 

(2.1) 

Var  Var(T  -  S) 

l-p2(T,S)  [*CT-S)-1 

Here  k(T)  •  (E  T*)/(E  T2)2  denotes  the  kurtosis  of  T.  The  squared 
correlation  of  T  and  S  ,  p2(T,S)  ,  equals  the  relative  efficiency 
Var  S/Var  T  whenever  S  is  minimum  variance  in  a  linear  class  contain¬ 
ing  T,  as  it  will  be  in  our  applications  (Section  6  has  a  proof) .  Thu* 
subject  to  comparability  of  k(T)  and  ic(T-S) ,  the  efficiency  of  the 
swindle  increases  quadrat ically  as  the  squared  correlation  of  T  and 
S  approaches  one.  Note  also  that  the  ratio  (2.1)  can  be  interpreted 
as  the  factor  N^/N^g  by  which  the  number  of  replications  to  achieve 
a  desired  precision  is  reduced  by  using  the  swindle. 


In  measuring  swindle  gains,  one  should  also  assess  the  relative 
costs  of  computing  T  and  T  -  S  .  Since  these  will  typically  depend  on 
the  algorithm  and  the  machine,  we  will  not  indicate  these  comparisons 
explicitly.  However  if,  for  example,  S  is  a  Pitman  estimator,  the 
extra  effort  involved  in  finding  S  may  be  so  great  as  to  render  the 
swindle  impractical. 

2.2  Gaussian  Over  Independent  Swindle 

This  swindle  was  introduced  by  Dixon  and  Tukey  (1968)  and  Relies 
(1970)  and  applied  extensively  in  the  Princeton  study.  To  date  it  has 
mainly  been  used  for  location  problems:  Simon  (1976)  gives  a  survey 
discussion  in  this  setting.  We  outline  it  here  in  a  (more  general) 
regression  setting  in  which  at  the  same  time  the  method  seems  more 
natural  (cf.  also  Goodfellow  and  Martin  (1976)).  Johnstone  and  Velleman 

m/ 

(1984)  use  this  (and  the  score  function  swindle  of  Section  3)  in  a  small- 
sample  comparison  of  several  resistant  simple  linear  regression  methods. 

Suppose  that  observations  are  drawn  from  a  linear  model ,  Y  *  X8  ♦  e 
where  Y  is  an  n  x  l  column  vector,  X  is  a  fixed  n  x  p  matrix  of 
carriers,  8  a  p*l  parameter  vector  and  e  an  n*l  vector  of  i.i.d. 

mm  mm 

variables  2^/W^  drawn  from  a  Gaussian-over-independent  distribution;  i^. 
2 

2^  ^  N(0,a  ),  and  the  W^  are  i.i.d.  positive  and  independent 
of  .  (Table  l  lists  some  distributions  in  the  Z/W  family.) 

Suppose  that  T(X,Y)  is  a  regression-invariant  estimator  of  8  : 

T(X,  cY  -  Xd)  «  cT(X,Y)  -  d  for  any  c  «  R  ,  d  e  Rp  .  We  seek  a 
variance  decomposition  for  Var  T  . 
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The  denominators,  ,  constitute  extra  inforaation  available  to 
the  simulation  (but  not  available  in  real  data  when  only  (X^,Y^)  are  observed. 
Here  they  can  be  used  to  construct  an  estimator  with  known  variance.  Indeed, 
conditional  on  ,  8  and  a 2  can  be  estiiaated  by  standard  weighted 
least  squares  estimates  and  ,  the  former  having  covariance 
matrix  o2(X*A2X)“1  where  A  ■  diagCH^)  .  From  the  normal  theory 

A  A  2 

assumptions  on  and  conditional  on  ,  it  follows  that  (8,^,  o^) 
are  complete  sufficient  statistics  for  (8,  a2)  ,  and  that  the  standard- 

•m 

ized  residuals  e  ■  (y  -Xf!^)/aw  are  ancillary.  Basu's  sufficiency- 
ancillarity  theorem  (e.g.  Simon  1976,  Lehmann  1983  p.46)  ensures 
independence  of  the  triple  (§w,  a2,  e)  .  Using  the  decomposition 
y  *  X§w  >  a^e  and  regression  invariance  of  8,  we  obtain 

T(y)  -  s w  *  owT(e)  . 

Suppose  also  that  T  is  unbiased.  (This  will  happen  for  example 
if  regression  invariance  holds  fbr  negative  values  of  c  also.)  If 
the  distribution  used  to  generate  the  data  satisfies  a2  *  1,  8  ■  0  , 
then  conditional  on  and  using  independence, 

Var  (T(y)ljf)  -  (X"A^  X)”1  +  Var  (T(e)|W). 

Mow  take  expectation  with  respect  to  W  ,  and  use  the  fact  that 
E(T(y)|W)  «  0,  E(T(e)|W)  ■  0  to  obtain  finally 

Var  T(y)  ■  E(X'aJx)"1  ♦  Var  T(e)  . 

«  «  n  «•  •• 
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la  many  cases,  the  first  term  can  be  evaluated  analytically, 
numerically,  or  once-and- for-al 1  by  Monte  Carlo,  while  Var  T(e)  , 
being  smaller  than  Var  T(£) ,  can, in  principle, be  estimated  more  accu¬ 
rately  (c.f.  equation  (2.1)). 

2.3  Limitations  of  the  Z/W  Swindle 

The  Gaussian-over-independent  swindle  depends  crucially  on  the 
Z/W  representation  for  the  distribution  of  the  underlying  data.  While 
the  class  of  distribution  that  arise  as  variance  mixtures  of  normals  is 
rich  (see,  for  example,  Andrews  and  Mallows,  1974  and  Efron  and  Olshen 
1978),  it  is  only  a  subset  of  the  symmetric  distributions .  Even  within 
this  subset  the  gains  realized  from  the  swindle  tend  to  decrease  for 
Z/W  distributions  with  heavy  tails. 

This  phenomenon  can  be  accounted  for  in  part  by  restrictions  on 
the  first  term  in  the  swindle  gain  (2.1).  Heavy  tailed  Z/W  distribu¬ 
tions  must  have  some  W.^  far  from  unity.  Knowledge  of  W  will  then 
convey  more  information  about  the  sample,  leading  the  variance  of  §w 
to  fall  further  below  the  smallest  attainable  variance.  This  bounds 
p2(T,  §w)(»  Var  S^/Var  T)  away  from  1.0  . 

Figure  1  illustrates  this  effect  for  Student's  t  distributions 
(for  which  W^  a.  J x2/\>  ) .  The  Pitman  variances  in  Figure  1  are  the 
smallest  attainable  among  invariant  estimators  of  location.  All  other 
estimators  would  thus  appear  above  the  Pitman  values.  (See  Appendix  A 
for  the  Pitman  variances  and  notes  on  how  they  were  estimated.) 

While  we  might  ideally  wish  to  swindle  relative  to  the  Pitman 
estimators  (see,  e.g.  Andrews  et.  al. ,  1972,  p.  61  and  Section  5  below). 


the  expense  of  computing  then  would  cancel  much  of  the  swindle  gain. 
However,  as  Figure  l  shows,  often  the  Pitman  efficiency  is  not  far  from 
the  Craaer-Rao  bound.  This  fact  motivates  the  method  of  the  next 
section,  in  which  "estimators"  with  variance  equal  to  the  Craaer-Rao 
bound  are  used  to  obtain  higher  correlation  with  T(y)  while  still 
possessing  a  variance  decomposition. 


3.  THE  SCORE  FUNCTION  SWINDLE 

Suppose  now  that  Yj,...,Y  are  independent  random  variables  and 
that  has  density  ^(y.8),  0  e  0  c  jiP  .  (Often  the  will  be 
identical) .  We  shall  assume  that  the  densities  are  smooth  enough 
to  permit  the  manipulations  below  (the  "Cramer  conditions"  (Cramer,  1946) 
would  more  than  suffice).  Suppose  that  T(Y)  is  unbiased  for  8,  ,  at 
least  up  to  a  constant: 

Ej  TCY)  •  Sj  .  e„  ,  6.8. 

Given  an  arbitrary  vector-valued  statistic  S  ,  the  linear  combination 
c*S  having  maximum  correlation  with  T  is  just  the  (population) 

linear  regression  of  T  on  S  ,  c*  ■  ct_-  ,  where  ■  Cov(S)  and 
a_  *  Cov(T.S)  .  The  resulting  variance  decomposition  is 

Var  T  -  cr^g  t'1  *  Var(T-c*  S)  . 

In  general  a^g  will  be  no  easier  to  estimate  than  Var  S  . 

n  3 

However,  the  score  function,  S(Y,9)  ■  Z  log  f(y^,  0)  provides  a 

statistic  S  for  which  is  simple,  and  yields  a  random  vector 

with  high  (multiple)  correlation  with  T  when  T  has  variance  close 
to  the  Cramer-Rao  lower  bound.  To  see  this,  differentiate  the  relation 
Ea  T(Y)  ■  9,  *  cn  to  obtain 
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A  "  fTM  —  5  *Cyt,  8) 

K  J  ~  30k  i-l  1  1 

-  Eg  T(Y)  Sk(Y,  8}  . 

Recalling  that  Eg  Sk(Y,  8)  ■  0  ,  we  find  by  fixing  0  at  the  value 
«•» 

(say  8.)  used  in  generating  the  data  that  S  ■  S(Y,  0.)  is  a 
"statistic"  with  the  property  ■  ej  »  (1,0,..., 0),  and  hence 

(3.1)  Var  T  -  ej  fj1  ^  *  Var(T-  e'  J"1  S)  . 

Since  $g  depends  only  on  the  densities  f^  it  is,  in  principle,  known 
or  calculable,  and  the  Monte  Carlo  can  be  restricted  to  estimation  of 
Var(T-e-  Jj1  S) . 

Remarks:  a)  This  approach  can  be  extended  by  taking  higher  derivatives  of 
the  likelihood  function.  We  discuss  in  Remark  7C  the  amount  gained  by 
the  more  refined  swindles  that  result. 

b)  Some  proposals  for  the  use  of  score  functions  and  Cramer-Rao 
bounds  are  discussed  in  the  setting  of  simultaneous  simulation  of  two 
variances  in  Appendix  B  of  Andrews  et  al.  (1972). 

Examples 

1)  Location.  Let  Y^  have  density  f(y-9)  for  f  positive  and 
piecewise  C1  on  F  .  Let  T(Y)  be  a  location  equivariant  estimator: 
T(Yj  *c,...,Yn  *c)  ■  T(Y)  >c  .  Then  clearly  EgT(Y)  •  8*  cQ  ,  where 
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n 

where  cQ  •  EQ(T(Y)  and  BQ  -  0  ,  S  •  -  Z  f'/fCY^  and  ts  »  n  1(f), 

the  Fisher  information  (for  location)  of  the  density  f  .  Thus  we 
have  the  decomposition 

(3.2)  Var  T(Y)  -  l/(nl(f))  ♦  Var  T(Y  -  0(Y)1) 

.  1 

where  9(Y)  ■  (nl(f))  S  .  Thus  if  one  knows  both  f'/f  and 
1(f)  ■  E(f’/f)2  ,  then  the  swindle  simply  bases  the  Monte  Carlo  estima¬ 
tor  on  the  data  Y  centered  by  8(Y)  .  Note  that  S  is  not  in  general 

*• 

a  location-equivariant  estimator  itself  (in  fact  it  will  be  if,  and  only 
if,  f  is  Gaussian!),  but  this  is  irrelevant  to  the  swindle  calculation 
Significantly,  there  is  no  need  for  f  to  be  symmetric. 

The  score  function  swindle  includes  situations  in  which  the  data 
are  not  an  i.i.d.  sample.  A  common  example  is  the  "one-wild"  sampling 
scheme  in  location  problems  (Andrews  et.al.,  1972;  Hoaglin,  Mosteller 
and  Tukey,  1983,  Chs.  10,11)  in  which  n  -1  observations  are  drawn 
from  f(y)  and  one  from  (l/o0)f(y/o0)  for  aQ  known  and  large.  The 
score  function  S(Y,0)  ■  -  (KYj/Oq) /o0  -  E £  ♦  (/*)  and  $s  ■  (n-l+Og2)  1(f) 

Siqipose  now  that  f  includes  scale  as  a  nuisance  parameter,  Y^ 

having  density  f(^-~)  .  Now  0  ■  (u,o)  and  if  T(Y)  is  a  location 
and  scale  equi variant  estimator  of  u  then  EgT(Y)  -  u+oEg  ^T(Y)  , 

where  0Q  •  (0,1)).  Here  the  score  function  S  •  (Z"  d(Yi),  z"  Yi  0(Yi) 
where  d(y)  -  -f'/f(y)  ,  and 


f  e  d2 


tm  ■  " 


E  Y^ 


EYfJ  E  Y2$2  J 
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Thus,  Var  TjfY) 


1  E  Y2  d2 

n  E  *2  E  Y*  *2-  (E  Y  *2)2 


(3.3)  ♦  Var  T(Y  -  •'  f"1  S  l)  . 

%  i  3  «*  « 

In  general,  one  would  expect  that  as  the  nuaber  of  nuisance 
parameters  increases  and  the  unbiasedness  condition  becomes  more  strin¬ 
gent,  the  Cramer- Rao  variance  bound  —  the  first  item  in  (3.1)  —  would 
increase,  thus  giving  a  better  swindle.  For  example,  if  f(y)  ■  cey  I (y<0}+ 

— y  ^  /  2 

ce  7  '  I(y20)  ,  the  Craaer-Rao  bound  is  easily  calculated  to  be  1.5r6 

times  the  bound  obtained  in  (3.1)  without  the  nuisance  parameter  for 
scale . 

2 

Note,  however,  that  if  f  is  symmetric  about  0,  then  E  Yd  (Y)  *  0 
and  the  above  decomposition  reduces  to  (3.2),  so  that  the  swindle  does 
not  gain  by  including  the  scale  parameter.  This  is  an  instance  of  a 
more  general  phenomenon:  if  the  orginal  estimation  problem  for  9^ 
satisfies  Stein's  necessary  condition  for  adaptation,  then  the  swindle 
cannot  be  improved  by  adding  a  finite  number  of  nuisance  parameters  to 
the  model.  In  the  present  context,  Stein's  condition  simply  requires 
that  Cov(S1,  S^)  ■  0  for  k*2  where  (S^Sj, . . . .S^)  is  the  score 
function  vector  for  the  augmented  model.  For  more  information  on 
adaptive  (asymptotic)  estimation  see  Stein  (1956),  Bickel  (1982). 

2)  Regression.  For  simplicity  we  discuss  here  only  estimation  of 
slope  in  simple  linear  regression,  though  the  ideas  generalize  to 
arbitrary  linear  models.  Suppose  then  that  we  draw  n  observations 
from  the  model  Y^  ■  a  ♦  8 (x^  -  x)  ♦  where  the  x^^  are  fixed  and 


-IS' 


trt  i.i.d.  according  to  some  smooth  positive  density  f  .  (If  f 

is  s yaw trie,  we  would  gain  nothing  by  including  a  nuisance  paraaeter 

for  scale) .  Suppose  that  T(y)  is  a  regression-invariant  estimator  of 

8:  T(y-b(x-xl))  •  T(y)  -  b  .  In  previous  notation,  0  «  (6,a), 

|0  •  (0,0)  ,  S  -  -(Zj(xt-T)  ♦(yi)  ,  Ej  ♦(yjl))  ,  and  because  the  xi 

2  2  1  n  2 
are  centered,  Cov  S  «  diag(n  crv  1(f),  nl(f))»  where  av  •  —  £(x.  -x)  . 

«  a  a  n  ^  x 

Thus  e'  f"1  S  »  -(n  1(f))"1  Z^x^-x)  ♦(yi)  ,  which  in  the  special 

s 

case  ci  ^  N(0,1)  is  just  the  least  squares  estimate  of  B  •  Finally 

tg1  gj  ■  (n  a^I(f)]”1  .  The  swindle  has  obvious  extensions  to 
heteroscedastic  situations  in  which,  say,  Var(e)  varies  with  x  ,  or 
to  cases  in  which  the  themselves  are  a  random  sample  from  a  distri¬ 
bution.  This  method  was  used  extensively  in  the  regression  study  of 
Johnstone  and  Vellenan  (1984). 

3)  Scale  estimation.  If  S(Y^,...,Yn)  is  a  sea le-equi variant 

estimate:  S(cY1t...,cYJ  «  cS(Y)  and  the  Y.  are  i.i.d.  from  a 
x  n  -  i 

density  f(y/<r)  with  location  known,  then  log  S(Y)  ■  log  a  «• 

E.  log  S(Y)  and  log  S(Y)  is  unbiased  (up  to  a  constant)  for  log  a  . 

1  * * 

Now  it  is  often  argued  that  Var  log(Y)  is  an  informative  mea- 

a* 

stare  of  performance  of  S(Y)  (see  e.g.  Simon,  1976,  55).  To  apply  the 

location  score  function  swindle,  set  o'  ■  log  a  and  let  p(y,  ?')  • 

f(y  e"°  )  .  The  score  function  evaluated  for  a'n  »  0  equals  -  Z  y-fVf(y-) 

l  1  1 

and  the  variance  decomposition  becomes 
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it  follows  for  any  estimator  8  satisfying  (3.4),  that  for  fixed  z , 
E„  8(z,T)  ■  c  >8  and  hence  from  (3.1) 

p  ••  •*  o 

(3.5)  Varg  8  -  Varg  S  ♦  Varg(?  -S) 

where  S  »  (I  z^)"1  £  zi(l-ti  e  l)  and  Var  S  •  (£  zf)'1  . 

Extension  to  arbitrary  (but  known)  baseline  hazard  rate  is 

y  O 

straightforward  but  perhaps  restrictive:  if  X(z,t)  »  9(t)e  ,  t  >  0 

for  0  >  0  known  and  9(t)  ■  0(s)  ds  ,  then  (3.5)  reaains  valid 

J0 

for  estimators  8  (such  as  the  MLE)  which  satisfy  8(z,  3_1(eYi  9( t))) 
6(z,t)  -y  ,  if  t,  in  the  definition  of  S  is  replaced  by  9(t.)  . 


4.  AN  EXAMPLE:  PITMAN  VARIANCES 

This  section  illustrates  the  use  of  the  score  function  swindle  in 
a  simple  but  instructive  situation:  the  computation  of  the  variances 
of  the  Pitman  estimators  of  location  under  sampling  from  distributions 
in  the  t^  -family  .  The  Pitman  estimator  of  location  8  based  on 
n  i.i.d.  observations  from  a  density  f (x  -  8)  on  K  (see  (4.1) 
below)  has  minimum  variance  amongst  all  location  equi variant  estimators 
(Pitman,  1939) •  It  is  thus  a  natural  baseline  against  which  to  measure 
the  relative  efficiencies  of  other  location-equi variant  estimators. 

In  general,  however,  the  variances  of  Pitman  estimators  cannot  be 
evaluated  analytically.  Hoaglin  (1975)  reports  on  numerical  evaluations 
of  Pitman  variances  for  selected  small  sample  sizes  from  three  particular 
distributions  (including  the  Cauchy,  or  t^) .  For  the  t^  family 
used  in  our  swindle  comparison  experiments  in  Section  5,  no  other 
estimates  of  Pitman  variances  seem  to  be  available  in  the  literature. 

Our  Monte  Carlo  trials  to  obtain  the  Pitman  variances  are  in 
principle  a  straightforward  application  of  the  score-function  swindle 
in  the  form  (3.2).  In  fact,  (1.1)  and  (2.1)  reveal  that  we  are  in  the 
situation  where  this  swindle  is  most  effective:  since  the  Pitman 
estimator  is  the  minimum  variance  location  estimator,  it  has  maximum 
possible  correlation  (among  location  estimators)  with  the  score  function 
statistic,  whose  variance  is  much  more  readily  evaluated. 

The  Pitman  estimator  of  9  based  on  n  observations  from  the 

distribution  is  given  by 


:4.1)  d  (y)  .  fe  ir"  .  f  Cy.-e)de/[rr"  f  ry.  -8)de 


where  fy(y)  *  cr(l+yVv)*'-v',’4‘,/2  and  cy  ■  T{rnfi/T{v/2)/™  .  The 
score  function  and  Fisher  information  are 


S  -  Jgi  Ej  -i. 

0  1  M*x] 


Thus  it  is  only  necessary  to  estiaate  the  variance  of  d^  when 
applied  to  saaples  after  centering  at  S  ,  and  then  to  add  this  to 
1  ♦  2/ (v  ♦  1)  to  obtain  an  estiaate  of  the  Pitman  variance. 

The  following  section  (especially  Table  3)  documents  the  dramatic 
increase  in  precision  (in  terms  of  sampling  variability)  of  these 
variance  estimates  over  those  obtained  by  the  standard  G/I  swindle. 


TABLE  2 


VARIANCES  OF  THE  PITMAN  ESTIMATES  OF 
LOCATION  FOR  SMALL  SAMPLES  FROM 
STUDENT'S  t  POPULATION 

Variance  (standard  error  in  units  of  last  reported 
decimal  place) 


2000 

1000 


s 


A  NUMERICAL  COMPARISON 


The  performance  of  variance  decomposition  swindles  depends  on  the 

2  0 

three  measures  in  equation  (2.1):  (i)  p ,  the  squared  correlation  between 
the  statistic  T  and  the  control  S  ;  (ii)  ic(T)  ,  the  kurtosis  of 
the  sampling  distribution  of  T  on  samples  of  size  n  from  the  under¬ 
lying  distribution  of  the  data,  F  ;  and  (iii)  ie(T-S),  the  kurtosis  of 
the  sampling  distribution  of  T  -  S  (or  equivalently,  of  T  applied  to 
the  residuals  after  removing  S ) .  These  values  depend  on  T  ,  on  S  , 
on  the  underlying  distribution  F  ,  and  on  the  sample  size,  n  .  We 
describe  the  results  of  a  simulation  comparison  of  the  score  function 
and  Gaussian-over-independent  swindles  in  selected  location  problems. 

5.1.  Correlation  Term 

We  have  previously  noted  that  the  relative  size  of  Pitman, 
Cramer-Rao  and  optimal  weighted  least  squares  variances  limits  the 
maximum  possible  correlation  between  T  and  the  control  S  .  However, 
the  behavior  depicted  in  Figure  1  is  itself  dependent  on  sample  size. 
Figure  2  shows  the  effect  of  sample  size  on  the  Pitman,  Cramer-Rao,  and 
optimal  weighted  least  squares  variances  fbr  Cauchy  data.  The  score 
function  swindle  offers  relatively  little  advantage  in  samples  smaller 
than  10,  but  substantial  advantages  in  samples  larger  than  20  where  the 
computational  effort  needed  for  the  naive  variance  estimates  is  of 
course  much  greater.  (The  advantages  will  generally  also  be  greater 
for  less  extreme  distributions.) 
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We  emphasize  that  the  variance  decomposition  swindles  will 
generally  perform  better  when  applied  to  more  efficient  statistics. 

Thus  for  the  sum  computing  expense  we  will  learn  aore  about  the  better 
performing  (and  thus  usually  aore  interesting)  statistics.  This 
phenoaenon  was  used  to  advantage  in  estiaating  the  Pitman  variances  of 
the  previous  section. 

5.2.  Kurtosis  Terns 

The  kurtosis  ratio  that  forns  the  second  factor  in  equation  (2.1) 
makes  it  desirable  that  the  kurtosis  of  (T-S)  not  be  substantially 
greater  than  the  kurtosis  of  T  .  Often  T  will  be  asymptotically 
normal  and  even  its  small-sample  saapling  distribution  will  be  very 
nearly  normal.  (This  is  true,  for  exaaple,  of  many  robust  estiaators 
of  location  or  regression  even  at  very  heavy- tailed  densities.) 
Unfortunately,  the  sampling  distribution  of  (T  -  S)  can  be  very  lepto- 
kurtic.  In  general,  the  kurtosis  of  (T-S)  tends  to  be  higher  when 
T  and  S  are  highly  correlated  (thus  counteracting  the  advantage  of 
the  high  correlation  somewhat) ,  when  the  underlying  density  is  itself 
leptokurtic,  and  when  the  sample  size  is  saall.  We  have  no  good  way  to 

predict  the  kurtosis  ratio,  however  we  have  estiaated  it  in  Monte  Carlo 

4  4 

experiments  by  accumulating  ZT  as  well  as  E(T-S)  . 

The  degradation  of  variance  estimates  for  leptokurtic  densities 
is  well  known.  (See,  for  exaaple.  Yule  and  Kendall  19S0,  p.  443.) 
Briefly,  the  aore  extreae  instances  provide  much  of  the  information 
about  the  variance,  thus  reducing  the  effective  sample  size.  In  the 
swindle,  when  var(T-S)  is  very  small,  a  few  extraordinary  samples 
with  large  (T-S)  can  dominate  the  variance  estimate. 


S.3  Performance 


Table  3  summarizes  the  performance  of  these  variance  decomposi¬ 
tion  swindles  in  a  variety  of  situations.  For  location  estimators  the 
swindle  gains  are  smallest  for  the  most  extreme  population  distributions 
(i.e.,  t  on  small  degrees  of  freedom)  and  increase  as  the  distributions 
approach  the  Gaussian.  The  larger  swindle  gains  reflect  high  efficien¬ 
cies  of  particular  estimators  at  particular  distributions.  Figure  3 
adds  swindle  gain  information  to  Figure  1.  The  dependence  of  swindle 
gain  on  efficiency  can  be  seen  especially  clearly  at  t2  . 

Only  rarely  in  these  trials  was  the  Gaussian-over-Independent 
swindle  more  effective  than  the  score  function  swindle.  Usually  the 
latter  was  10  to  SO  times  more  effective. 

The  larger  swindle  gains  deliver  results  with  precision  simply 
not  obtainable  by  naive  methods.  A  typical  trial  of  1000  replications 
required  over  100  seconds  of  CPU  time  on  an  IBM  370/168.  In  the  most 
extreme  case  (5%  trimmed  mean  for  samples  of  40  from  t16  )  naive 
methods  would  have  required  over  IS  CPU  days  of  computing  time  for 
equivalent  precision.  The  results  for  the  Pitman  variance  in  the  same 
situation  would  have  required  158  CPU  days.  Of  course  these  last 
figures  should  not  be  taken  too  literally,  as  other  sources  of  error 
(numerical,  rounding)  have  not  been  assessed.  What  is  clear  is  that 
sampling  variability  can  be  substantially  reduced  (or  even  effectively 
eliminated  in  efficient  cases) . 

Although  the  G/I  method  does  not  apply  to  asymmetric  densities, 
the  score  function  swindle  does .  Trials  on  the  rather  extreme  ahsolute 
Cauchy  distribution  yielded  swindle  gains  of  up  to  20  in  samples  of  40 
and  up  to  10  in  samples  of  20. 
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6.  A  SIMPLE  FRAMEWORK 

The  siaple  geometrical  setting  given  here  provides  a  way  to  think 
about  swindles  for  variances  that  can  suggest  new  applications  —  includ¬ 
ing  sone  of  these  discussed  in  Section  7.  The  result  (6.1)  below 
is  standard  in  estiaation  theory  (Rao,  1973,  Lehmann,  1983)  but  its 
role  in  Monte-Carlo  studies  is  explicitly  noted  in  unpublished  lecture 
notes  of  Charles  Stein. 

Suppose  that  T(Y)  belongs  to  an  affine  subset  Z  (translate  of 
a  linear  subspace)  of  the  class  of  all  estimators  having  finite  variance 
under  the  distribution  Pq  generating  the  data.  Suppose  also  that 
S(Y)  is  the  best  (oinifflum  variance)  estimator  belonging  to  Z  ,  Then  a 
version. of  Pythagoras'  theorem  gives  the  variance  decomposition 

(6.1)  var_  T  ■  varB  S  *  var„  (T-S)  . 

P0  P0  P0 

One  way  to  see  this  is  to  note  that  since  J  is  affine,  S+e(T-S) 

lies  in  Z  for  each  e,  so  that  varD  (S+e( T-S))  is  minimized  at 

*0 

e«0.  Differentiation  shows  that  S  and  T-S  are  uncorrelated, 

which  amounts  to  (6.1).  We  note  in  passing  that  expanding  var_  (T-S) 

2  P0 
shows  that  p  (T,S)  ■  Var  S/Var  T,  as  remarked  in  Section  2.1. 

The  usefulness  of  (6.1)  hinges  on  the  choice  of  Z  since  as  Z 
increases,  var  S  decreases.  Recall  that  we  want  var  S  to  be 

p0  P0 

both  known  (or  easily  calculated)  and  large.  We  illustrate  this  first 
by  seeing  how  a  number  of  swindles  in  the  literature  on  location  esti¬ 
mation  fit  into  this  framework. 


Suppose  that  the  data,  Y,  consists  of  n  i.i.d.  observations 
fTom  a  density  f(y-0)  on  R.  Reasonable  estimators,  T,  of  0  are 
(at  least)  location  equi variant,  so  to  estimate  the  variance  of  T  , 
with  no  loss  of  generality  choose  Pq  above  to  correspond  to  9  ■  0  . 

(i)  (Stein)  a)  Let  Z  be  the  class  of  all  unbiased  location 
equi variant  estimators.  Then  S  is  the  Pitman  estimator 

s(y)  -  Je  nj  f(y.-9)  de/|n”  f(y.-0)  de  . 

Typically,  var  S  is  not  known  analytically  and  must  also  be  estimated 
(see  Appendix  A  for  discussion) .  An  estimator  suggested  by  Stein  is 


(6.2) 


varT-* 


N 

l 

J-i 


E[(S(J))2|VCJ)] 


£r(T(J)-  sCJ))2  . 


Here  the  superscript  J  refers  to  the  J**1  replication  of  the  i.i.d. 
sample  (Y.,...,Y)  from  Pn  ,  and  to  a  maximal  invariant 

(Y(J)  _'Y(J)  ...,Y(J)  _  Y(J)}  #  the  right  side  of  (6.2)  is  the  condi- 
i  n  n- i  n 

tional  expectation  of  the  naive  estimate  1/N  lJ(T(J))2  given 

V< «,..., ,  so  var  T  is  certainly  a  more  precise  (i.e.,  lower 
variance)  estimate  of  var  T  than  the  naive  one.  Of  course,  a  (uni¬ 
variate)  numerical  integration  is  needed  to  compute  each  S^  and 

E((S^)  2|  ]  ,  but  these  can  then  be  used  repeatedly  in  estimating 

the  variances  of  many  equi variant  estimators. 

b)  The  same  program  is  possible  if  Z  is  restricted  to  the  class 
of  location  and  scale  equi variant  estimators,  with  the  Pitman  location  - 
scale  estimator  serving  as  the  "control  function"  S  .  Of  course, 
bivariate  numerical  integrations  are  now  necessary. 


(ii)  (Takeuchi,  1971)  J  ■  unbiased  linear  combinations  of  order 
statistics  Y^  with  weights  c^  (not  necessarily  positive)  summing 
to  1.  Then  S(Y)  «  Z  cj  Y(iJ  ,  where  c*  ■  tjH/tllfJ1!)  and 

^  ■  cov^Y^,  Y(j)^  *  Tllus»  once  tf  is  known*  bo*11  s  and 

var  S  are  readily  computable. 

(iii)  Gaussian  over  independent  swindle.  (Andrews  et.  al.,  1972 

Hodges,  1967) .  A  special  assumption  on  f  is  needed,  namely  that  the 

£ 

observations  Y^  ,  as  in  Section  2,  have  the  form  Y^  -  0  • 

where  2^  *v>  N(0, 1)  and  is  independent  of  Z^  .  Now  J  is  the 

class  of  unbiased  location- scale  equi variant  estimators .  Here  the 

variance  decomposition  is  performed  conditional  on  (tf^,...,Wn)  ,  so 

the  problem  of  estimating  8  becomes  that  of  estimating  the  slope  in 

the  normal  theory  regression  model  Y^^  -  *  01T  ♦  2 ^  ,  where  the 

Hh  are  known  but  the  2^  are  not.  The  definition  T(W,Y)  «T(Yj/Wj, 

.. ,,Y  /W  )  associates  a  slope  estimator  T  in  the  regression  model 
n  n 

with  T  «  J;  further,  since  T  is  location-scale  equi variant,  T  is 
regression  equi variant  (in  the  sense  of  Section  2)  and  unbiased  for  0 
To  apply  the  decomposition  (6.1),  let  S(W,Y)  ■  2  W-Yj/ZW*  be  the 
minimum  variance  unbiased  estimator  of  0  .  Thus  conditional  on  W  , 

40 


Var  T  ■  Var  f 


♦  Var(T  -  S)  . 


Finally,  since  E(T|W)  »  0  under  9-0,  take  expectations  over  W 
to  express  the  unconditional  variance  of  T  as 
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U 


Var  T  -  E0(l/iwj)  ♦  Efl  T2(Xi  -5)  . 


(In  fact,  the  second  tern  on  the  right  can  be  further  decomposed 
slightly  by  exploiting  the  independence  of  the  normal  theory  aean  and 
variance  estimators  (cf .  Simon,  1976) ,  but  the  extra  improvement  in 
precision  that  results  in  small  relative  to  that  to  that  obtained  here.) 


(iv)  The  score  function  swindle.  Define  T  to  be  locally 
unbiased  for  0  it  8^  if  Eg  T  •  8g  and  3/30  EgTlg^g  »  1  . 

Choose  7  as  the  affine  space  of  estimators  that  are  locally  unbiased 
at  0Q  «  0  .  As  in  Section  3,  under  appropriate  regularity  conditions, 
we  have  for  statistics  T(Y)  of  finite  variance 


(6.3)  ±  E,  T|9-()  .  E,S0T  , 

where  SQ  is  the  score  function  for  location.  Normalizing  S0  t0 
give  S  ■  Sq/EqSq  ensures  from  (6.3)  that  S  belongs  to  7,  and 
further  that  it  is  uncorrelated  with  T  -  S  for  T  e  7 .  This  yields 
the  variance  decomposition  (6.1). 

An  analogous  treatment  is  possible  to  the  (successively  smaller) 
affine  spaces  3^  consisting  of  estimators  locally  unbiased  up  to 
order  k  :  i.e.  in  addition  to  the  properties  above,  we  require  that 
3j/30j  Efl  T-0  for  j  ■  2,...,k  . 
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7.  FURTHER  APPLICATIONS 

Stein  effect  and  Bayesian  robustness.  Consider  estimation  of 
9  ■  )  using  independent  observations  ^  N(0^,  c2) 

i»l,...,p,  when  loss  in  estiaation  of  0  by  5(x)  ■  (5.  (x) , . . .  ,<5_(x)) 

mm  mm  mm  1  *  P  mm 

is  Measured  by  |5(x)  -0|2  «  E^C^Cx)  -  0^*  and  risk  by  R(0,S)  » 
E0|5(X)  -  0|2  .  It  is  often  of  interest  to  study  the  integrated  risk 

r(ir,5)  *  /R(9,5)ir(d0)  of  a  rule  5  and  the  Bayes  risk  r(ir)  * 

inf  r(ir,6)  relative  to  a  prior  measure  ir(d8)  .  For  example,  Efron 

and  Morris  (1972)  and  Berger  (1982)  have  used  r(ir,6)  and  r(ir,6)  - 

r(ir)  in  studying  the  "relative  savings  risk"  of  Stein-type  estimator 

from  empirical  and  robust  Bayesian  viewpoints  respectively. 

A  fixed  prior  ir(d0)  determines,  in  conjuction  with  the  sampling 

model,  a  marginal  measure  H(dx)  and,  under  the  quadratic  loss  function 
2 

L  ,  an  L  decomposition  analogous  to  (5.1),  namely 
(7.1)  r(ir,5)  -  r(ir)  ♦  f|«(x)  -  «_(x)  | 2  H(dx)  , 

mm  J  mm  mm  mm  -* 

where  6  (x)  ■  E[0|x]  is  the  Bayes  rule  minimizing  r(ir,5)  .  The 

mm  mm  mm  mm 

integral  above  is  much  easier  to  simulate  (or  evaluate  numerically) 
than  r(ir,5)  •  JR(0,5)  dir(0)  ■  /E[L(0,5)|x]  H(dx)  .  There  is  a  further 

mm  mm  ^  mm  mm  ^  mm 

saving  if  one  compares  several  6  for  a  fixed  tt  (as  done  in  Berger 

mm 

(1982)),  since  r(ir)  need  only  be  evaluated  once.  Although  analytic 

expressions  fbr  R(9,6)  are  available  for  many  of  rules  <5  of 
«  «  *» 

interest  in  the  Gaussian  case,  this  special  feature  disappears  for 
other  location  densities,  whereas  the  decomposition  (6.1)  persists. 


Multivariate  Location 


Let  T (y1# ...,yn)  be  an  be  an  unbiased,  location  equi variant 
estinate  of  the  location  parameters  0  <  Fd  based  on  n  i.i.d.  obser- 
vations  from  a  smooth  density  f (y  -  0)  in  Fd  .  In  principle  the 
score  function  swindle  extends  directly,  but  we  mention  a  couple  of 


interesting  features.  The  score  function  'statistic'  is  now  a  vector 


with  components  S D^f (y^/fCy^  having  mean  0  and 


covariance  matrix  £  Defining  S  »  £~ S*-0^  leads  to  the  matrix 


.CO) 


decomposition 


$T  "  -  S  * 

where  $T  is  the  covariance  matrix  of  T  .  Note  therefore  that  covari¬ 
ances  of  the  components  of  T  can  be  swindled  in  addition  to  the  vari¬ 
ances  of  the  individual  .  Such  a  swindle  could  be  used  to  study 
efficiency  properties  of,  for  example,  the  computational ly  costly  high- 
breakdown,  affine-equi variant  estimates  of  multivariate  location 
proposed  by  Donoho  (1982)  and  Stahel  (1982).  * 

Discrete  Parameter  Spaces 

A  discrete  parameter  version  of  the  Cramer- Rao  inequality  (the 
Hammers ley-Chapman-Robb ins  inequality)  leads  to  a  natural  analog  of 
the  score  function  swindle.  Suppose  that  Y  has  density  p(y,9)  >  0 
for  y  «  and  0  «  0  .  Fix  0Q  e  0  ,  A  such  that  0Q  «•  A  s  ©  .  Let 

Z  •  (T:  E.  .T-Efl  T«c>  .  Then  the  decomposition  (S.l)  holds  with 

S  •  ap/E\p2  and  ip  ■  {p(x,  0Q  +  A)/p(x,0Q)}  -  l  .  This  version  of  the 


swindle  can  be  of  use,  for  example,  in  settings  where  the  parameter  8 
is  restricted  to  lie  in  a  lattice  such  as  the  integers,  as  in  the 
problem  of  estiaation  of  aolecular  weight  discussed  by  Haaaersley  (1950) . 

Bootstrap  Estimates  of  Variance 

To  take  a  specific  example,  suppose  that  observations  x^,...,x 
are  taken  i.i.d.  from  distribution  F(x-8),  and  we  wish  to  use  a  trans¬ 
lation  invariant  estimate  T(x,,...,x  )  to  estimate  6  .  In  constrast 
with  the  simulation  contexts  considered  earlier,  it  is  not  assumed  here 
that  F  is  known.  Bootstrap  estimates  of  Var  pT(x1,...,x  )  are 
obtained  by  replacing  F  by  (some  function  of)  its  empirical  distribu¬ 
tion  function  Fn  ,  say  F*  ,  and  estimating  Varp*  T(xx, ...,xn)  by 

#  n 

drawing  N  i.i.d.  samples  form  FR  and  then  using  the  usual  variance 
estimator. 

Consider  the  following  modification  of  the  score  function  proce¬ 
dure  to  reduce  the  number  N  of  "boots"  required.  Construct  a  density 
estimate  fR  from  Fr  (say  by  using  kernel  methods)  such  that  an 
enpirical  score  function  f'/f  and  estimated  information  /( f'/f  )2f  dx 
can  be  easily  evaluated.  Write  F*  for  the  cdf  corresponding  to 
density  fn  .  Now  draw  i.i.d.  samples  from  F*  and  apply  the  location 
scoto  function  swindle,  thus  estimating  only  Var_*  T(X-§(X)  1)  .  This 

r 

proposal  is  speculative  at  present:  work  is  in  progress  to  evaluate  the 
improvements  obtained  in  particular  situations. 


Suppose  that  the  control  statistic  S  satisfies  (1.1)  and  let  us 

coapare  the  swindle  gain  from  the  variance  decomposition  (1.1)  (given 

in  (2.1))  with  the  swindle  gain  froa  the  "regression  estimate"  (1.2)  in 

the  overly-optiaistic  situation  that  b*  ■  -  Cov(32,  a2)/Var(Sg)  is 

known.  In  this  case,  assuming  as  in  S2.1  that  ET  ■  ES  •  0  and  that 
*2  <*2 

jg  and  aT  are  given  by  naive  aethod-of-aoaents  estimators ,  we  have 


Var  (a2) 
Vano^) 


[l-p2(o2,  a2)  I’1  -  [1  -  p2(S2,  T2)]"1 


To  render  the  calculations  simple,  suppose  to  a  first  order 
approximation  that  T  and  S  are  jointly  normal  with  the  same  variance 
and  with  correlation  n.  Then  p(S2,  T2)  ■  n2,  and  it  is  easily  checked 
froa  (2.1)  that 


Vartf2) 


Var(o‘.s)  (1-n  V 


j  Var(52) 

1-n4  Varia2) 


so  that  under  these  crude  conditions  the  swindle  gains  froa  (1.1)  are 
better  than  those  froa  (1.2)  by  a  factor  of  5  when  n  ■  .8  and  by  a  factor 
of  10  when  n  ■  .9  . 

Suppose  now  that  we  wish  to  estimate  Var  Tj^  -  Var  T2  for  T^Tj 
satisfying  (1.1)  for  the  same  Si  Again  for  simplicity,  assume  that 
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and  S  are  jointly  Gaussian  with  means  0,  the  same  variances, 

and  pdj.S)  ■  p(T2,S)  »  w,  p(T1,T2)  ■  p  .  It  follows  that  the  optimal 

b*  and  bl  above  will  be  equal,  and  hence  that  al  -  al  ■  a*  -al  » 

rl  l2  T1  r2  1  T2 

so  that  the  regression  swindle  per  se  offers  no  improvement.  However, 
since  T1  and  are  correlated 

(8.1)  N  Var(o£  -  al  )  •  al 2  *  0I2  ~  ^t2  t2  *  4(l*o2)  » 

1  2  *1  *2  *1*  2 

so  that  the  precision  of  the  difference  is  greater  than  that  of  each 
individual  term,  (here  N  is  the  number  of  Monte-Carlo  trials) .  Does 
the  variance  decomposition  swindle  (1.1)  help  here?  Easy  normal  theory 
calculations  show  that 

(8.2)  N  Var(S^_S)  -C^_S))«  4(4(1  -w)2-  (l*p-  2w)2}  . 

Denoting  w/p  by  a,  then  (8.2)  is  smaller  than  (8.1)  exactly  when 
p>l/2o  .  Thus,  for  example,  if  all  of  T^.Tj  and  S  are  equally 
correlated,  the  variance  decomposition  dominates  the  simple  difference 
when  that  correlation  exceeds  1/2. 

8.B.  Swindles  for  Variance  Ratios 

Sometimes  we  are  primarily  interested  in  estimating  an  efficiency 
Var  T/Var  S,  where  S  has  minimum  variance  amongst  all  estimators  in 
Z  .  If  Var  S  is  known,  then  the  improvement  achieved  by  a  variance 
decomposition  swindle  can  be  measured  simply  by  comparing  Var  Var(T-S) 


with  Var  Vtr  T  as  above.  If  Var  S  oust  also  be  estimated,  then  we 
can  give  a  crude  indication  of  the  improvement  attained  as  follows.  We 
continue  to  assume  ES  •  ET  ■  0  and  to  use  moment  estimators  for  Var  S  , 
Var  T  and  Vax(T-S)  .  Thus,  using  the  generic  labels  s  and  t  for 
replications  S(X2)  and  T(X*)  for  I  ■  1,...,N  ,  we  seek  to  compare 
the  variances  of 


(8.3) 


A 


(the  naive  and  swindled  estimates  respectively) .  Making  the  rather 
crude  assumption  that  the  values  of  t-s  (small  compared  to  s)  are 
independent  of  s  and  symmetrically  distributed  about  zero,  one  finds 
that 


(8.4) 


E(t2|(t-s)2  ♦  s2)  -  s2  ♦  (t-s)2  , 


so  that  certainly  Var(Et2)  2  Var(Es2  ♦  E(t-s)2)  .  A  more  explicit 

expression  for  the  difference  in  variances  of  e1  and  e2  follows  by 
2  2  2 

expanding  t‘  as  s  ♦  2s (t-s)  +  (t-s)  and  conditioning  on  all 

Sj  ■  S (X1)  values.  From  the  independence  and  distributional  assumptions 

on  t-s  ,  we  have  E(e1|s1 . sN)  .  E(e2(s1, ...,sN)  ,  so 


Var (3.)  -  Var(e,)  -  E  Var 


2Zs(t-s)»Eft-s)" 


Zs* 


-E  Var 


L  Ss“  !-J 


E  \-x  Var.[2Es(t-s)  |s  I  «4  Var(t-s)  E  — i»-2&. 
(Is2)2  '  (Es2) 
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A  further  point  to  note  is  that  e2  4  1 ,  whereas  it  is  certainly 
possible  for  e^  to  be  less  than  1 ,  in  contradiction  with  the  opti- 
aality  of  S  . 

8.C.  Bhattacharya  Sounds 

The  Craaer-Rao  bound  is  the  first  of  a  sequence  of  lower  bounds 
to  the  variance  of  an  estiaator  that  can  be  obtained  by  using  succes¬ 
sively  higher  derivatives  of  the  likelihood  to  build  control  functions. 
In  estimation  of  a  single  paraaeter  9  ,  these  Bhattacharya  bounds  take 
the  fora  (e.g.  Lehmann,  1983,  p.  129) 

(8.S)  varfl  5  4  a'  ic“l(0)  a  *  B_  , 

u  —  ■*  p 

where  ■  [fg(y)]-1  3^/30^  fg(y),  o'  is  a  row  natrix  with  entries 

5(Y)  -  cov(5,ip(j))  j  -  , 

and  <^(0)  »  CoVgOp^,  ip^)  .  If  5(Y)  is  unbiased  for  0  (at 
least  up  to  a  constant),  then  the  lower  bound  becomes  [<“1(0)  , 

which  by  standard  aatrix  theory  is  an  increasing  function  of  p  .  When 
the  Bhattacharya  bounds  are  strictly  closer  to  the  Pitman  bound,  they 
lead  in  principle  to  aore  effective  swindles.  In  practice,  the  cases 
discussed  below  suggest  that  for  p  4  3  or  for  moderate  to  large  n 
(when  the  C-R  bound  becomes  quite  good  anyway) ,  the  improvement  is 
not  very  significant. 


Consider  the  location  problem,  initially  with  p  *  2  ,  and 
fg(y)  ■  Hj,j  ffyj-0)  .  If  fi  is  unbiased,  then  we  easily  calculate  the 
percentage  improvement  in  the  lower  bound  from 


•C..  < 


•ti-»V1).*(2,>r1  . 


<11IC22HC12 


In  fact  B^/B^  is  independent  of  6  in  the  location  problem,  and  if 
0  «  0,  and  0^}  ■  -  f'/ff ,  then 

d(1)(y)  -  dCy^  d(2)(y)  -  -  zrcyj  ♦  [WCy^]2  . 

Now  if  f  is  symmetric  about  0  ,  then  f,  d2  and  d  ’  are  even  func¬ 
tions  while  d  is  odd,  so  that  Cov(d^»  d^)  •  0  .  Thus  the  second 
order  Bhattacharya  bound  offers  no  improvement. 


Even  when  f  is  asymmetric,  the  gain  decreases  inversely  with  n 


Indeed 


(8.6) 


p2C*(1).  *») 


»‘[Bn  *2(11-1)  CMT1 


where  n  »  nCyj)  ■  (d2-d,)(y1)  .  For  a  specific  example,  let  f(y)  ■ 

Cj  n(y)  for  y>0,  and  cff  n(y/o)  for  y<0,  where  n(y)*(2ir)“li  e“: 
and  c0  «  (l4c)/2  .  Then 


p2(d(l),  dC2))  • 


ir[l«>np/(a-l)  H 


Ia  the  s /metric  location  case  one  is  forced  to  look  at  third 
order  bounds,  and  it  is  easily  shown  that 


Bj/Bj  -  [i-p2ooci),  ^n"1 , 


and 


pV1^*)  -  -7-7 


E$‘[5%9(n~l)E<fr 


£4^ - rr 

Cn-2)(E*  )  1 


where  C  »  .  If  f  is  Cauchy,  then  calculation  shows 

that 


p2WCl),  <K(3)) 


- - 

n  +3n+S 


Thus  the  improvement  will  typically  be  quite  small:  in  this  case,  for 
n»5,  B_/B  »  1.0465  for  example. 
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TABLE  1 

GAUSSIAN/ INDEPENDENT  (Z/W)  DISTRIBUTIONS 
(from  Simon  (1976)) 


Distribution 

W  Drawn  Form 

N(0,  ah 

H  • 

1/<J2 

\ 

W  * 

^(v)/v 

(Cauchy  *  tj) 

W  - 

1  ♦  (W)  1 

"Contaminated  normal" 

W  • 

1  with  prob  ■  p 
'  1/a2  with  prob  *  1 

"Slash" 

W  ^  U(0,1] 

Laplace 

(Double  Exponential) 

f(w) 

•  w”3  exp(-w"2/2) 

•J 


244U  3:  6 

UXSBU 

cats  MCTOts  rot  no  vaaiasa  oxcoKrosrtzas  svtsous 

XSI:./: 

m.-. 

Li e 

in  i-l:  ■  u  t  Lizm  l  m  i 

2 _ 

4 

• 

AK0KJ 

CADG 

TOT 

n 

"  kwl 

z/v 

l/V 

—m 

10 

1.4 

1.9 

20.6 

3.4 

44.9 

7.9 

39.3  11.6 

20.4 

U.O 

1ZBSZCSZ  20 

6.4 

1.4 

47J 

3.3 

109.7 

A 

74.9  20.1 

39.4 

U.4 

10.3 

40 

12.6 

2.9 

113.0 

6.1 

304.9 

lev 

136.9  16.9 

94.4 

22.1 

19.9 

10 

1.2 

1.4 

2.4 

2.3 

24.9 

We9 

299.4  17.3 

747.9 

42.7 

mxa  20 

1.6 

1.3 

U.4 

2.0 

99.3 

•47.3  34.9 

2041.3 

49.4 

3.4 

40 

2.6 

1.6 

U.2 

3.0 

126.9 

* 

3129.7  29.4 

2032.6 

92.4 

6.2 

40 

11.9 

2.4 

202 

10 

1.0 

9.0 

nxseati 

>  20 

1.2 

1.4 

16.2 

2.4 

26.4 

4.0 

20.2 

10.1 

27.4 

4.9 

21.9 

4.4 

19.7 

6.3 

2.*  2.2 


40 

1.9 

1.3 

10.3 

3.0 

32 

10 

1.0 

3.0 

1.0 

1.0 

tombed 

20 

1.0 

1.0 

3.6 

1.3 

40 

1.3 

1.2 

3.2 

1.4 

1.0  1.6 


163.2 

469.2 

494.6 

1966.1 

7066.0 
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Figure  1.  Nx  Var  for  Plfwn  and  Best  Weighted  Leaet  Squares  Estiaates  compared  to  Craaer  Kao  Bound  (CRLB) 
for  Student's  t  Distributions  on  v  » 1,2,4, 8,16,«®  d< f .  and  N*23. 
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Figure 
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Figure  3.  Swindle  gains  for  the  Estimators  in  Table  3  are  superimposed  on  Table  1*  Area  of  circles  are 
inversely  proportional  to  swindle  gains  (zero  gain  would  give  a  circle  with  about  twice  the 
radius  of  the  largest  circle^ for  x5) ,  Note  that  swindle  gains  Increase  with  efficiency  of  the 
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,  Monte  Carlo  swindles*  or  variance  reduction  techniques  exploit  the  experi¬ 
menter  ' s  knowledge  of  the  stochastic  structure  governing  the  simulated  data  to 
construct  more  precise  estimates  of  unknown  parameters.  Alternatively,  one  can 
reduce  the  number  of  replications  (and  thus  the  cost)  needed  to  gain  a  desired 
level  of  precision.  This  paper  reviews  the  common  case  of  swindles  based  on 

variance  decompositions  for  estimating  efficiencies  and  variances  of  location 

T> 

and  regression  estimators.  -We  then  propose^  new  swindle  based  on  Fisher's 
efficient  score  function  that  can  be  applied  to  a  much  wider  range  of  situations 
than  can  the  Causslan-over-independent  swindles  used  in  many  studies  of  robust 

ru 

estimators.  -We  compare  these  methods  by  performing  simulations  for  the  ef¬ 
ficiencies  of  location  estimates  and  by  placing  them  in  a  simple  geometric  frame- 

rkti 

work.  We  illustrate  the  use  of  the  score  function  swindle  in  estimating  the 
variances  of  Pitman  estimates  of  location  for  samples  from  the  t-distrlbution  at 
selected  degrees  of  freedom.  Finally ^we^sketch^applicatlons  to  scale  estimatlot 
exponential  regression,  statistical  decision  theory,  and  bootstrap  computations* 
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